Efficient top-k processing over query-dependent functions

نویسندگان

  • Lin Guo
  • Sihem Amer-Yahia
  • Raghu Ramakrishnan
  • Jayavel Shanmugasundaram
  • Utkarsh Srivastava
  • Erik Vee
چکیده

We study the efficient evaluation of top-k queries over data items, where the score of each item is dynamically computed by applying an item-specific function whose parameter value is specified in the query. For example, online retail stores rank items by price, which may be a function of the quantity being queried: “Stay 3 nights, get a 15% discount on double-bed rooms.” Similarly, while ranking possible routes in online maps by predicted congestion level, the score (congestion) is a function of the time being queried, e.g., “At 5PM on a Friday in Palo Alto, the congestion level on 101 North is high.” Since the parameter—the number of nights or the time the online map is queried, in the above examples—is only known at query time, and online applications have stringent response-time requirements, it is infeasible to evaluate every item-specific function to determine the item scores, especially when the number of items is large. Further, space considerations make it infeasible to pre-compute and store the score of each item for each value of the input parameter. In this paper, we develop a novel technique that compresses the (large) set of item scores for all parameter values by dividing the parameter range into intervals, taking into account the expected query workload. This compressed representation is then used to do top-k pruning of query results. Our experiments show that the proposed techniques are scalable and efficient.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Secure Top-k Query Processing on Encrypted Databases

Privacy concerns in outsourced cloud databases have become more and more important recentlyand many efficient and scalable query processing methods over encrypted data have been proposed.However, there is very limited work on how to securely process top-k ranking queries over encrypteddatabases in the cloud. In this paper, we focus exactly on this problem: secure and efficient proce...

متن کامل

Skyline-enabled Storage System

This is a vision statement for a novel idea of enabling skyline operators over a large-scale file system, in order to support efficient and automated management over large files systems. As file systems grows rapidly, so does the complexity of the underlying storage systems, which often contains a variety of hardware with distinct performance profiles e.g., hard disks, Flash memory based device...

متن کامل

TopX: efficient and versatile top-k query processing for text, structured, and semistructured data

TopX is a top-k retrieval engine for text and XML data. Unlike Boolean engines, it stops query processing as soon as it can safely determine the k top-ranked result objects according to a monotonous score aggregation function with respect to a multidimensional query. The main contributions of the thesis unfold into four main points, confirmed by previous publications at international conference...

متن کامل

Efficient Early Top-k Query Processing in Overloaded P2P Systems

Top-k query processing in P2P systems has focused on efficiently computing the top-k results while reducing network traffic and query response time. However, in overloaded P2P systems (with very high query loads), some peers may take a long time to answer, thus making the user wait a long time to obtain the final top-k result. In this paper, we address this problem, which we reformulate as earl...

متن کامل

Computing Immutable Regions for Subspace Top-k Queries

Given a high-dimensional dataset, a top-k query can be used to shortlist the k tuples that best match the user’s preferences. Typically, these preferences regard a subset of the available dimensions (i.e., attributes) whose relative significance is expressed by user-specified weights. Along with the query result, we propose to compute for each involved dimension the maximal deviation to the cor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2008